AITopics

Industry: Health & Medicine (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsJan-26-2025, 03:03:03 GMT

Review for NeurIPS paper: VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain

Weaknesses: My central concern for this paper is the misalignment between the motivation and methodology. As motivation, the authors argue that self-supervised CV and **NLP** "algorithms are not effective for tabular data." The proposed model, though, is effectively the binary masked language model whose variants pervade self-supervised NLP research (e.g. Granted, instead of masking words, the proposed models are masking tabular values, but this is performing a very similar pretext task. In fact, there is concurrent work that learns tabular representations using a BERT model [1].

self-and semi-supervised learning, tabular data, tabular domain, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)

Neural Information Processing SystemsJan-26-2025, 03:02:56 GMT

Review for NeurIPS paper: VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain

This paper proposes a new reconstruction loss for unsupervised training of representations. This loss extends auto-encoders via a pretext task that uses the marginal distribution of features. The reviewers were unanimous in their decision to accept this paper.

neurips paper, self-and semi-supervised learning, tabular domain, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.85)

Neural Information Processing SystemsOct-10-2024, 15:25:48 GMT

VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain

self-and semi-supervised learning, semi-supervised learning framework, tabular domain, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.95)

Lee, Seung-eon, Lee, Sang-Chul

TablEye: Seeing small Tables through the Lens of Images

arXiv.org Artificial IntelligenceJul-3-2023

The exploration of few-shot tabular learning becomes imperative. Tabular data is a versatile representation that captures diverse information, yet it is not exempt from limitations, property of data and model size. Labeling extensive tabular data can be challenging, and it may not be feasible to capture every important feature. Few-shot tabular learning, however, remains relatively unexplored, primarily due to scarcity of shared information among independent datasets and the inherent ambiguity in defining boundaries within tabular data. To the best of our knowledge, no meaningful and unrestricted few-shot tabular learning techniques have been developed without imposing constraints on the dataset. In this paper, we propose an innovative framework called TablEye, which aims to overcome the limit of forming prior knowledge for tabular data by adopting domain transformation. It facilitates domain transformation by generating tabular images, which effectively conserve the intrinsic semantics of the original tabular data. This approach harnesses rigorously tested few-shot learning algorithms and embedding functions to acquire and apply prior knowledge. Leveraging shared data domains allows us to utilize this prior knowledge, originally learned from the image domain. Specifically, TablEye demonstrated a superior performance by outstripping the TabLLM in a 4-shot task with a maximum 0.11 AUC and a STUNT in a 1- shot setting, where it led on average by 3.17% accuracy.

artificial intelligence, deep learning, machine learning, (18 more...)

2307.02491

Country: Asia > South Korea > Daegu > Daegu (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Energy (0.93)
Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Kireev, Klim, Kulynych, Bogdan, Troncoso, Carmela

Adversarial Robustness for Tabular Data through Cost and Utility Awareness

arXiv.org Artificial IntelligenceFeb-24-2023

Many safety-critical applications of machine learning, such as fraud or abuse detection, use data in tabular domains. Adversarial examples can be particularly damaging for these applications. Yet, existing works on adversarial robustness primarily focus on machine-learning models in image and text domains. We argue that, due to the differences between tabular data and images or text, existing threat models are not suitable for tabular domains. These models do not capture that the costs of an attack could be more significant than imperceptibility, or that the adversary could assign different values to the utility obtained from deploying different adversarial examples. We demonstrate that, due to these differences, the attack and defense methods used for images and text cannot be directly applied to tabular settings. We address these issues by proposing new cost and utility-aware threat models that are tailored to the adversarial capabilities and constraints of attackers targeting tabular domains. We introduce a framework that enables us to design attack and defense mechanisms that result in models protected against cost and utility-aware adversaries, for example, adversaries constrained by a certain financial budget. We show that our approach is effective on three datasets corresponding to applications for which adversarial examples can have economic and social implications.

artificial intelligence, machine learning, natural language, (19 more...)

2208.13058

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Gharasuie, Morteza Mohammady, Wang, Fenjiao

Progressive Feature Upgrade in Semi-supervised Learning on Tabular Domain

arXiv.org Artificial IntelligenceDec-1-2022

Recent semi-supervised and self-supervised methods have shown great success in the image and text domain by utilizing augmentation techniques. Despite such success, it is not easy to transfer this success to tabular domains. It is not easy to adapt domain-specific transformations from image and language to tabular data due to mixing of different data types (continuous data and categorical data) in the tabular domain. There are a few semi-supervised works on the tabular domain that have focused on proposing new augmentation techniques for tabular data. These approaches may have shown some improvement on datasets with low-cardinality in categorical data. However, the fundamental challenges have not been tackled. The proposed methods either do not apply to datasets with high-cardinality or do not use an efficient encoding of categorical data. We propose using conditional probability representation and an efficient progressively feature upgrading framework to effectively learn representations for tabular data in semi-supervised applications. The extensive experiments show superior performance of the proposed framework and the potential application in semi-supervised settings.

artificial intelligence, machine learning, representation, (18 more...)

2212.00892

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Wang, Zifeng, Sun, Jimeng

TransTab: Learning Transferable Tabular Transformers Across Tables

arXiv.org Artificial IntelligenceSep-16-2022

Tabular data (or tables) are the most widely used data format in machine learning (ML). However, ML models often assume the table structure keeps fixed in training and testing. Before ML modeling, heavy data cleaning is required to merge disparate tables with different columns. This preprocessing often incurs significant data waste (e.g., removing unmatched columns and samples). How to learn ML models from multiple tables with partially overlapping columns? How to incrementally update ML models as more columns become available over time? Can we leverage model pretraining on multiple distinct tables? How to train an ML model which can predict on an unseen table? To answer all those questions, we propose to relax fixed table structures by introducing a Transferable Tabular Transformer (TransTab) for tables. The goal of TransTab is to convert each sample (a row in the table) to a generalizable embedding vector, and then apply stacked transformers for feature encoding. One methodology insight is combining column description and table cells as the raw input to a gated transformer model. The other insight is to introduce supervised and self-supervised pretraining to improve model performance. We compare TransTab with multiple baseline methods on diverse benchmark datasets and five oncology clinical trial datasets. Overall, TransTab ranks 1.00, 1.00, 1.78 out of 12 methods in supervised learning, feature incremental learning, and transfer learning scenarios, respectively; and the proposed pretraining leads to 2.3% AUC lift on average over the supervised learning.

artificial intelligence, deep learning, machine learning, (16 more...)

2205.09328

Country: North America > United States > Illinois (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.49)
Health & Medicine > Therapeutic Area > Oncology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)